Skip to content

Conversation

iliailmer
Copy link

@iliailmer iliailmer commented Oct 12, 2025

This PR adds Metal-based implementation of CONV_TRANSPOSE_2D operation (#14909)

TODO:

  • Tests

@iliailmer iliailmer marked this pull request as ready for review October 12, 2025 20:17
@iliailmer iliailmer requested a review from ggerganov as a code owner October 12, 2025 20:17
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Oct 12, 2025
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the necessary requirements for the input tensors here:

case GGML_OP_CONV_TRANSPOSE_1D:
case GGML_OP_CONV_TRANSPOSE_2D:
return true;

For example, the implementation assumes that src0 and src1 are contiguous.

@iliailmer iliailmer changed the title Add metal conv transpose 2d Add CONV_TRANSPOSE_2D for Metal Oct 13, 2025
@iliailmer iliailmer changed the title Add CONV_TRANSPOSE_2D for Metal Add CONV_TRANSPOSE_2D for Metal Oct 13, 2025
@iliailmer
Copy link
Author

Added the checks for type, and is_contiguous checks as well.

@iliailmer iliailmer requested a review from ggerganov October 14, 2025 00:23
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be more efficient to have KH x KW threads in each threadgroup, instead of just 1.

Would you like to try that in this PR or in a follow-up PR?

Co-authored-by: Georgi Gerganov <[email protected]>
@iliailmer
Copy link
Author

yeah, let me try it out in this one. i should be able to do it by the end of the week or so

@iliailmer iliailmer requested a review from slaren as a code owner October 16, 2025 02:00
@github-actions github-actions bot added the testing Everything test related label Oct 16, 2025
test_cases.emplace_back(new test_conv_2d_dw({512, 512, 256, 1}, {3, 3, 1, 256}, 1, 1, 1, true));

test_cases.emplace_back(new test_conv_transpose_2d({256, 256, 256, 1}, {3, 3, 16, 256}, 1));
test_cases.emplace_back(new test_conv_transpose_2d({16, 16, 16, 1}, {3, 3, 8, 16}, 1));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these tests while developing, we can remove them if necessary

@iliailmer
Copy link
Author

The 256x256 test case for performance is still running a bit too long i think (on m1 macbook pro, 16gb), but the two smaller cases are showing good results:

ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_conv_transpose_2d_f16_f32', name = 'kernel_conv_transpose_2d_f16_f32'
ggml_metal_library_compile_pipeline: loaded kernel_conv_transpose_2d_f16_f32              0x100d50ac0 | th_max = 1024 | th_width =   32
  CONV_TRANSPOSE_2D(ne_input=[16,16,16,1],ne_kernel=[3,3,8,16],stride=1):               8192 runs -  7508.63 us/run -       28 kB/run -    0.00 GB/s
  CONV_TRANSPOSE_2D(ne_input=[10,10,9,1],ne_kernel=[3,3,1,9],stride=2):                 8192 runs -  3513.65 us/run -        5 kB/run -    0.00 GB/s
  Backend Metal: OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants